Subset Modeling (Operator Toolbox)

Synopsis

This operator allows you to create one model per group in your data set.

Description

Often you have a data set with fairly distinct group, like age group or geography. It can be useful to create one model per group instead of one global model. The underlying assumption is, that the pattern between the groups varies drastically. Especially for non-tree based models this approach may yield better results.

This operator is a meta operator, meaning you define the training function "within" the operator. The operator loops over the given data set. In each iteration a table with only one subgroup is provided on the left hand side input port in the inner process. It is mandatory to create a model within this operator and provide this model at the upper port within the operator.

The resulting model is called subset model and can be applied on any given table, which also has the subset attribute available. For each subgroup the corresponding model is selected and applied on this subset. Afterwards the results are merged back into the complete data table. Note that this operator does not preserve the order of the examples in application.

If the subset attribute contains values which where not present during training the model will predict missing values.

Input

  • exa (Data table)

    The table to train the subset model on

Output

  • mod

    The subset model which contains a model per class in the subset attribute

  • ori (Data table)

    The original data passed through.

  • out (IOObject)

    Output ports which can pass through data from within the inner process.

Parameters

  • subset column Column which is used to identify the groups within the training table for which a model is created.

Tutorial Processes

Train a model for each class on the titanic